Introduction

“A problem is a chance for you to do your best.” Duke Ellington

In this report we will be looking and handling data about patients that had been admitted to the ICU, with problems related to hearth deceased. The scope of the study is to explore the data, recognize and clean the data in order to obtain a clean input to make a further selection or manipulation in the features that can give us the most information to run a machine learning model in the phase 2.

First look to the dataset

Importing data

In this exercise we will use the MIMIC-IV data set, this is a publicly available database containing datums about patient measurements, diagnoses, and other classification information.

Description of variables

For this first part, we are going to explore the general characteristics of the dataset.

How many records do we have? How many variables?

## 
## 
## |    x|
## |----:|
## | 6377|
## |  207|

Comments: For this particular version of the data we have 6377 observations and 207 variables.

What are the variable names? Are they meaningful?

## 
## 
## |x                                                                 |
## |:-----------------------------------------------------------------|
## |subject_id                                                        |
## |gender                                                            |
## |age                                                               |
## |mortality                                                         |
## |ethnicity                                                         |
## |Heart.Rate                                                        |
## |Heart.rate.Alarm...High                                           |
## |Heart.Rate.Alarm...Low                                            |
## |Arterial.Blood.Pressure.systolic                                  |
## |Non.Invasive.Blood.Pressure.systolic                              |
## |Arterial.Blood.Pressure.diastolic                                 |
## |Non.Invasive.Blood.Pressure.diastolic                             |
## |Respiratory.Rate                                                  |
## |Respiratory.Rate..Set.                                            |
## |Respiratory.Rate..spontaneous.                                    |
## |Respiratory.Rate..Total.                                          |
## |SpO2.Desat.Limit                                                  |
## |INR                                                               |
## |Prothrombin.time                                                  |
## |Anion.gap                                                         |
## |Creatinine..serum.                                                |
## |Temperature                                                       |
## |Potassium..Whole.Blood.2                                          |
## |Potassium..whole.blood.                                           |
## |Sodium..whole.blood.                                              |
## |Sodium..Whole.Blood                                               |
## |Chloride..Whole.Blood                                             |
## |Chloride..whole.blood.                                            |
## |Bicarbonate                                                       |
## |Glucose..whole.blood.                                             |
## |GCS...Eye.Opening                                                 |
## |Hemoglobin                                                        |
## |Hemoglobin.2                                                      |
## |Hematocrit                                                        |
## |Platelet.Count                                                    |
## |Acute.myocardial.infarction.of.anterolateral.wall..episode.of.c   |
## |Acute.myocardial.infarction.of.anterolateral.wall..initial.epis   |
## |Acute.myocardial.infarction.of.anterolateral.wall..subsequent.e   |
## |Acute.myocardial.infarction.of.other.anterior.wall..episode.of.   |
## |Acute.myocardial.infarction.of.other.anterior.wall..initial.epi   |
## |Acute.myocardial.infarction.of.other.anterior.wall..subsequent.   |
## |Acute.myocardial.infarction.of.inferolateral.wall..episode.of.c   |
## |Acute.myocardial.infarction.of.inferolateral.wall..initial.epis   |
## |Acute.myocardial.infarction.of.inferolateral.wall..subsequent.e   |
## |Acute.myocardial.infarction.of.inferoposterior.wall..episode.of   |
## |Acute.myocardial.infarction.of.inferoposterior.wall..initial.ep   |
## |Acute.myocardial.infarction.of.inferoposterior.wall..subsequent   |
## |Acute.myocardial.infarction.of.other.inferior.wall..episode.of.   |
## |Acute.myocardial.infarction.of.other.inferior.wall..initial.epi   |
## |Acute.myocardial.infarction.of.other.inferior.wall..subsequent.   |
## |Acute.myocardial.infarction.of.other.lateral.wall..episode.of.c   |
## |Acute.myocardial.infarction.of.other.lateral.wall..initial.epis   |
## |Acute.myocardial.infarction.of.other.lateral.wall..subsequent.e   |
## |Acute.myocardial.infarction.of.other.specified.sites..episode.o   |
## |Acute.myocardial.infarction.of.other.specified.sites..initial.e   |
## |Acute.myocardial.infarction.of.other.specified.sites..subsequen   |
## |Acute.myocardial.infarction.of.unspecified.site..episode.of.car   |
## |Acute.myocardial.infarction.of.unspecified.site..initial.episod   |
## |Acute.myocardial.infarction.of.unspecified.site..subsequent.epi   |
## |Postmyocardial.infarction.syndrome                                |
## |Acute.coronary.occlusion.without.myocardial.infarction            |
## |Old.myocardial.infarction                                         |
## |Certain.sequelae.of.myocardial.infarction..not.elsewhere.classi   |
## |Acute.myocardial.infarction                                       |
## |ST.elevation..STEMI..myocardial.infarction.of.anterior.wall       |
## |ST.elevation..STEMI..myocardial.infarction.involving.left.main    |
## |ST.elevation..STEMI..myocardial.infarction.involving.left.anter   |
## |ST.elevation..STEMI..myocardial.infarction.involving.other.coro   |
## |ST.elevation..STEMI..myocardial.infarction.of.inferior.wall       |
## |ST.elevation..STEMI..myocardial.infarction.involving.right.coro   |
## |ST.elevation..STEMI..myocardial.infarction.involving.other.coro.2 |
## |ST.elevation..STEMI..myocardial.infarction.of.other.sites         |
## |ST.elevation..STEMI..myocardial.infarction.involving.left.circu   |
## |ST.elevation..STEMI..myocardial.infarction.involving.other.site   |
## |ST.elevation..STEMI..myocardial.infarction.of.unspecified.site    |
## |Non.ST.elevation..NSTEMI..myocardial.infarction                   |
## |Acute.myocardial.infarction..unspecified                          |
## |Other.type.of.myocardial.infarction                               |
## |Myocardial.infarction.type.2                                      |
## |Other.myocardial.infarction.type                                  |
## |Subsequent.ST.elevation..STEMI..and.non.ST.elevation..NSTEMI..m   |
## |Subsequent.ST.elevation..STEMI..myocardial.infarction.of.anteri   |
## |Subsequent.ST.elevation..STEMI..myocardial.infarction.of.inferi   |
## |Subsequent.non.ST.elevation..NSTEMI..myocardial.infarction        |
## |Subsequent.ST.elevation..STEMI..myocardial.infarction.of.other    |
## |Subsequent.ST.elevation..STEMI..myocardial.infarction.of.unspec   |
## |Certain.current.complications.following.ST.elevation..STEMI..an   |
## |Hemopericardium.as.current.complication.following.acute.myocard   |
## |Atrial.septal.defect.as.current.complication.following.acute.my   |
## |Ventricular.septal.defect.as.current.complication.following.acu   |
## |Rupture.of.cardiac.wall.without.hemopericardium.as.current.comp   |
## |Rupture.of.chordae.tendineae.as.current.complication.following    |
## |Rupture.of.papillary.muscle.as.current.complication.following.a   |
## |Thrombosis.of.atrium..auricular.appendage..and.ventricle.as.cur   |
## |Other.current.complications.following.acute.myocardial.infarcti   |
## |Acute.coronary.thrombosis.not.resulting.in.myocardial.infarctio   |
## |Old.myocardial.infarction.2                                       |
## |Rheumatic.heart.failure..congestive.                              |
## |Congestive.heart.failure..unspecified                             |
## |Systolic..congestive..heart.failure                               |
## |Unspecified.systolic..congestive..heart.failure                   |
## |Acute.systolic..congestive..heart.failure                         |
## |Chronic.systolic..congestive..heart.failure                       |
## |Acute.on.chronic.systolic..congestive..heart.failure              |
## |Diastolic..congestive..heart.failure                              |
## |Unspecified.diastolic..congestive..heart.failure                  |
## |Acute.diastolic..congestive..heart.failure                        |
## |Chronic.diastolic..congestive..heart.failure                      |
## |Acute.on.chronic.diastolic..congestive..heart.failure             |
## |Combined.systolic..congestive..and.diastolic..congestive..heart   |
## |Unspecified.combined.systolic..congestive..and.diastolic..conge   |
## |Acute.combined.systolic..congestive..and.diastolic..congestive.   |
## |Chronic.combined.systolic..congestive..and.diastolic..congestiv   |
## |Acute.on.chronic.combined.systolic..congestive..and.diastolic..   |
## |Atrial.fibrillation                                               |
## |Atrial.fibrillation.and.flutter                                   |
## |Paroxysmal.atrial.fibrillation                                    |
## |Persistent.atrial.fibrillation                                    |
## |Longstanding.persistent.atrial.fibrillation                       |
## |Other.persistent.atrial.fibrillation                              |
## |Chronic.atrial.fibrillation                                       |
## |Chronic.atrial.fibrillation..unspecified                          |
## |Permanent.atrial.fibrillation                                     |
## |Unspecified.atrial.fibrillation.and.atrial.flutter                |
## |Unspecified.atrial.fibrillation                                   |
## |Other.chronic.obstructive.pulmonary.disease                       |
## |Chronic.obstructive.pulmonary.disease.with..acute..lower.respir   |
## |Chronic.obstructive.pulmonary.disease.with..acute..exacerbation   |
## |Chronic.obstructive.pulmonary.disease..unspecified                |
## |Heat.stroke.and.sunstroke                                         |
## |Brain.stem.stroke.syndrome                                        |
## |Cerebellar.stroke.syndrome                                        |
## |National.Institutes.of.Health.Stroke.Scale..NIHSS..score          |
## |Heatstroke.and.sunstroke                                          |
## |Heatstroke.and.sunstroke.2                                        |
## |Heatstroke.and.sunstroke..initial.encounter                       |
## |Heatstroke.and.sunstroke..subsequent.encounter                    |
## |Heatstroke.and.sunstroke..sequela                                 |
## |Exertional.heatstroke                                             |
## |Exertional.heatstroke..initial.encounter                          |
## |Exertional.heatstroke..subsequent.encounter                       |
## |Exertional.heatstroke..sequela                                    |
## |Other.heatstroke.and.sunstroke                                    |
## |Other.heatstroke.and.sunstroke..initial.encounter                 |
## |Other.heatstroke.and.sunstroke..subsequent.encounter              |
## |Other.heatstroke.and.sunstroke..sequela                           |
## |Heatstroke.and.sunstroke..initial.encounter.2                     |
## |Heatstroke.and.sunstroke..subsequent.encounter.2                  |
## |Heatstroke.and.sunstroke..sequela.2                               |
## |Family.history.of.stroke..cerebrovascular.                        |
## |Family.history.of.stroke                                          |
## |Mixed.hyperlipidemia                                              |
## |Other.and.unspecified.hyperlipidemia                              |
## |Mixed.hyperlipidemia.2                                            |
## |Other.hyperlipidemia                                              |
## |Other.hyperlipidemia.2                                            |
## |Hyperlipidemia..unspecified                                       |
## |Other.chronic.obstructive.pulmonary.disease.2                     |
## |Chronic.obstructive.pulmonary.disease.with..acute..lower.respir.2 |
## |Chronic.obstructive.pulmonary.disease.with..acute..exacerbation.2 |
## |Chronic.obstructive.pulmonary.disease..unspecified.2              |
## |Senile.dementia..uncomplicated                                    |
## |Presenile.dementia..uncomplicated                                 |
## |Presenile.dementia.with.delirium                                  |
## |Presenile.dementia.with.delusional.features                       |
## |Presenile.dementia.with.depressive.features                       |
## |Senile.dementia.with.delusional.features                          |
## |Senile.dementia.with.depressive.features                          |
## |Senile.dementia.with.delirium                                     |
## |Vascular.dementia..uncomplicated                                  |
## |Vascular.dementia..with.delirium                                  |
## |Vascular.dementia..with.delusions                                 |
## |Vascular.dementia..with.depressed.mood                            |
## |Alcohol.induced.persisting.dementia                               |
## |Drug.induced.persisting.dementia                                  |
## |Dementia.in.conditions.classified.elsewhere.without.behavioral    |
## |Dementia.in.conditions.classified.elsewhere.with.behavioral.dis   |
## |Dementia..unspecified..without.behavioral.disturbance             |
## |Dementia..unspecified..with.behavioral.disturbance                |
## |Other.frontotemporal.dementia                                     |
## |Dementia.with.lewy.bodies                                         |
## |Vascular.dementia                                                 |
## |Vascular.dementia.2                                               |
## |Vascular.dementia.without.behavioral.disturbance                  |
## |Vascular.dementia.with.behavioral.disturbance                     |
## |Dementia.in.other.diseases.classified.elsewhere                   |
## |Dementia.in.other.diseases.classified.elsewhere.2                 |
## |Dementia.in.other.diseases.classified.elsewhere.without.behavio   |
## |Dementia.in.other.diseases.classified.elsewhere.with.behavioral   |
## |Unspecified.dementia                                              |
## |Unspecified.dementia.2                                            |
## |Unspecified.dementia.without.behavioral.disturbance               |
## |Unspecified.dementia.with.behavioral.disturbance                  |
## |Alcohol.dependence.with.alcohol.induced.persisting.dementia       |
## |Alcohol.use..unspecified.with.alcohol.induced.persisting.dement   |
## |Sedative..hypnotic.or.anxiolytic.dependence.with.sedative..hypn   |
## |Sedative..hypnotic.or.anxiolytic.use..unspecified.with.sedative   |
## |Inhalant.abuse.with.inhalant.induced.dementia                     |
## |Inhalant.dependence.with.inhalant.induced.dementia                |
## |Inhalant.use..unspecified.with.inhalant.induced.persisting.deme   |
## |Other.psychoactive.substance.abuse.with.psychoactive.substance.   |
## |Other.psychoactive.substance.dependence.with.psychoactive.subst   |
## |Other.psychoactive.substance.use..unspecified.with.psychoactive   |
## |Frontotemporal.dementia                                           |
## |Other.frontotemporal.dementia.2                                   |
## |Dementia.with.Lewy.bodies                                         |
## |Age.Group                                                         |

Comments: The variables names make mention of demographic classification, measurements on vital signs, lab tests and diagnoses about patient conditions or medical history.

What type is each variable—e.g., numeric, categorical, discrete, or logical?

Comments:

How many unique values does each variable have?

What value occurs most frequently, and how often does it occur?

Are there missing observations (vertically and horizontally)? If so, how frequently does this occur?

## The number of total missing values is: 37341
## The percentage of mortality is: 0.152 --> In the original dataset.
Missing Value Summary
missing_count missing_percentage
Temperature 3840 60.2164027
Chloride..whole.blood. 3569 55.9667555
Sodium..whole.blood. 3362 52.7207151
Glucose..whole.blood. 2953 46.3070409
Potassium..whole.blood. 2802 43.9391563
Arterial.Blood.Pressure.systolic 2478 38.8583974
Arterial.Blood.Pressure.diastolic 2477 38.8427160
Respiratory.Rate..Set. 2470 38.7329465
Hemoglobin 2357 36.9609534
Respiratory.Rate..spontaneous. 2300 36.0671162
Respiratory.Rate..Total. 2286 35.8475772
Chloride..Whole.Blood 2267 35.5496315
Sodium..Whole.Blood 2121 33.2601537
Potassium..Whole.Blood.2 1441 22.5968324
INR 236 3.7007997
Prothrombin.time 236 3.7007997
Non.Invasive.Blood.Pressure.systolic 45 0.7056610
Non.Invasive.Blood.Pressure.diastolic 45 0.7056610
Anion.gap 17 0.2665830
Creatinine..serum. 14 0.2195390
SpO2.Desat.Limit 13 0.2038576
Heart.Rate.Alarm…Low 4 0.0627254
Heart.rate.Alarm…High 3 0.0470441
Bicarbonate 3 0.0470441
Respiratory.Rate 2 0.0313627

Summary of demographic, vital signs and lab tests

##    subject_id       gender        age        mortality   
##  Min.   :10002430   F:2396   Min.   :19.00   Alive:5408  
##  1st Qu.:12494493   M:3981   1st Qu.:61.00   Death: 969  
##  Median :14959313            Median :70.00               
##  Mean   :14975796            Mean   :69.31               
##  3rd Qu.:17439137            3rd Qu.:79.00               
##  Max.   :19997293            Max.   :91.00               
##                                                          
##                   ethnicity      Heart.Rate    Heart.rate.Alarm...High
##  WHITE                 :4263   Min.   : 43.0   Min.   :    60.0       
##  UNKNOWN               : 693   1st Qu.: 95.0   1st Qu.:   120.0       
##  BLACK/AFRICAN AMERICAN: 428   Median :109.0   Median :   130.0       
##  OTHER                 : 180   Mean   :113.1   Mean   :   260.8       
##  WHITE - OTHER EUROPEAN: 165   3rd Qu.:128.0   3rd Qu.:   133.8       
##  WHITE - RUSSIAN       : 102   Max.   :295.0   Max.   :165130.0       
##  (Other)               : 546                   NA's   :3              
##  Heart.Rate.Alarm...Low Arterial.Blood.Pressure.systolic
##  Min.   :   40.0        Min.   :  0.0                   
##  1st Qu.:   50.0        1st Qu.:136.0                   
##  Median :   60.0        Median :150.0                   
##  Mean   :  168.8        Mean   :154.5                   
##  3rd Qu.:   60.0        3rd Qu.:168.0                   
##  Max.   :60120.0        Max.   :742.0                   
##  NA's   :4              NA's   :2478                    
##  Non.Invasive.Blood.Pressure.systolic Arterial.Blood.Pressure.diastolic
##  Min.   :   56.0                      Min.   :    1.0                  
##  1st Qu.:  135.0                      1st Qu.:   69.0                  
##  Median :  153.0                      Median :   78.0                  
##  Mean   :  160.8                      Mean   :  174.7                  
##  3rd Qu.:  171.0                      3rd Qu.:   92.0                  
##  Max.   :15878.0                      Max.   :91100.0                  
##  NA's   :45                           NA's   :2477                     
##  Non.Invasive.Blood.Pressure.diastolic Respiratory.Rate  Respiratory.Rate..Set.
##  Min.   :    41.0                      Min.   :  15.00   Min.   :   0.00       
##  1st Qu.:    82.0                      1st Qu.:  27.00   1st Qu.:  16.00       
##  Median :    98.0                      Median :  32.00   Median :  18.00       
##  Mean   :   215.9                      Mean   :  34.78   Mean   :  20.02       
##  3rd Qu.:   115.0                      3rd Qu.:  38.00   3rd Qu.:  20.00       
##  Max.   :105125.0                      Max.   :2037.00   Max.   :1618.00       
##  NA's   :45                            NA's   :2         NA's   :2470          
##  Respiratory.Rate..spontaneous. Respiratory.Rate..Total. SpO2.Desat.Limit
##  Min.   :   0.00                Min.   :   0.00          Min.   : 85.00  
##  1st Qu.:  13.00                1st Qu.:  18.00          1st Qu.: 85.00  
##  Median :  21.00                Median :  23.00          Median : 88.00  
##  Mean   :  20.81                Mean   :  25.74          Mean   : 89.38  
##  3rd Qu.:  28.00                3rd Qu.:  29.00          3rd Qu.: 88.00  
##  Max.   :1918.00                Max.   :3634.00          Max.   :920.00  
##  NA's   :2300                   NA's   :2286             NA's   :13      
##       INR           Prothrombin.time     Anion.gap        Creatinine..serum.
##  Min.   :     0.9   Min.   :     9.3   Min.   :     7.0   Min.   :     0.3  
##  1st Qu.:     1.2   1st Qu.:    13.5   1st Qu.:    14.0   1st Qu.:     0.9  
##  Median :     1.4   Median :    15.6   Median :    16.0   Median :     1.2  
##  Mean   :  5049.9   Mean   :  4905.8   Mean   :   174.5   Mean   :   944.9  
##  3rd Qu.:     1.8   3rd Qu.:    19.9   3rd Qu.:    20.0   3rd Qu.:     2.1  
##  Max.   :999999.0   Max.   :999999.0   Max.   :999999.0   Max.   :999999.0  
##  NA's   :236        NA's   :236        NA's   :17         NA's   :14        
##   Temperature    Potassium..Whole.Blood.2 Potassium..whole.blood.
##  Min.   :32.20   Min.   :  1.800          Min.   :     2.1       
##  1st Qu.:36.80   1st Qu.:  4.400          1st Qu.:     4.3       
##  Median :37.20   Median :  5.000          Median :     4.9       
##  Mean   :37.39   Mean   :  5.124          Mean   :  1403.6       
##  3rd Qu.:37.90   3rd Qu.:  5.600          3rd Qu.:     5.5       
##  Max.   :40.70   Max.   :134.000          Max.   :999999.0       
##  NA's   :3840    NA's   :1441             NA's   :2802           
##  Sodium..whole.blood. Sodium..Whole.Blood Chloride..Whole.Blood
##  Min.   :   118.0     Min.   :115.0       Min.   : 71.0        
##  1st Qu.:   135.0     1st Qu.:136.0       1st Qu.:103.0        
##  Median :   137.0     Median :138.0       Median :106.0        
##  Mean   :   800.3     Mean   :138.3       Mean   :106.1        
##  3rd Qu.:   139.0     3rd Qu.:140.0       3rd Qu.:109.0        
##  Max.   :999999.0     Max.   :187.0       Max.   :139.0        
##  NA's   :3362         NA's   :2121        NA's   :2267         
##  Chloride..whole.blood.  Bicarbonate   Glucose..whole.blood. GCS...Eye.Opening
##  Min.   :    11.0       Min.   :13.0   Min.   :     35       1:  72           
##  1st Qu.:   104.0       1st Qu.:28.0   1st Qu.:    147       2:  47           
##  Median :   107.0       Median :30.0   Median :    173       3:  53           
##  Mean   :   462.7       Mean   :30.7   Mean   :   2312       4:6205           
##  3rd Qu.:   109.0       3rd Qu.:33.0   3rd Qu.:    210                        
##  Max.   :999999.0       Max.   :51.0   Max.   :1276100                        
##  NA's   :3569           NA's   :3      NA's   :2953                           
##    Hemoglobin     Hemoglobin.2     Hematocrit    Platelet.Count  
##  Min.   : 0.00   Min.   : 5.10   Min.   :18.10   Min.   :   9.0  
##  1st Qu.:10.70   1st Qu.:12.30   1st Qu.:37.60   1st Qu.: 231.0  
##  Median :12.10   Median :13.60   Median :41.40   Median : 304.0  
##  Mean   :12.02   Mean   :13.52   Mean   :41.14   Mean   : 339.9  
##  3rd Qu.:13.40   3rd Qu.:14.80   3rd Qu.:44.60   3rd Qu.: 408.0  
##  Max.   :97.00   Max.   :22.60   Max.   :69.70   Max.   :2660.0  
##  NA's   :2357

Graphical Review

Outliers exploration

Distributions assessment

Histograms and QQ Plots for Numerical variables

Comments:

Bar plots for Categorical variables

Comments:

Variables Relationship Review

Correlation for Numerical variables

Comments: In this part we should go deeper on the variables that have higher correlation, doing paired scatter plot and cor.test

Correlation for Categorical variables

Mortality vs Gender

## [1] "Chi-Square correlation for  gender  vs  mortality"
##    Y
## X   Alive Death
##   F  1992   404
##   M  3416   565
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  stu_data
## X-squared = 8.0629, df = 1, p-value = 0.004518

Mortality vs Ethnicity

## [1] "Chi-Squqre correkation for  ethnicity  vs  mortality"
##                                            Y
## X                                           Alive Death
##   AMERICAN INDIAN/ALASKA NATIVE                 9     1
##   ASIAN                                        41    10
##   ASIAN - ASIAN INDIAN                         16     1
##   ASIAN - CHINESE                              56     8
##   ASIAN - KOREAN                                3     0
##   ASIAN - SOUTH EAST ASIAN                     12     4
##   BLACK/AFRICAN                                11     6
##   BLACK/AFRICAN AMERICAN                      339    89
##   BLACK/CAPE VERDEAN                           16     6
##   BLACK/CARIBBEAN ISLAND                       24     6
##   HISPANIC OR LATINO                           16     3
##   HISPANIC/LATINO - CENTRAL AMERICAN            1     0
##   HISPANIC/LATINO - COLUMBIAN                   7     0
##   HISPANIC/LATINO - CUBAN                       4     0
##   HISPANIC/LATINO - DOMINICAN                  34     6
##   HISPANIC/LATINO - GUATEMALAN                  6     0
##   HISPANIC/LATINO - HONDURAN                    6     1
##   HISPANIC/LATINO - MEXICAN                     2     0
##   HISPANIC/LATINO - PUERTO RICAN               61    16
##   HISPANIC/LATINO - SALVADORAN                  1     1
##   MULTIPLE RACE/ETHNICITY                       1     0
##   NATIVE HAWAIIAN OR OTHER PACIFIC ISLANDER    10     0
##   OTHER                                       148    32
##   PATIENT DECLINED TO ANSWER                   37     2
##   PORTUGUESE                                   12     5
##   SOUTH AMERICAN                                3     2
##   UNABLE TO OBTAIN                             42     8
##   UNKNOWN                                     553   140
##   WHITE                                      3687   576
##   WHITE - BRAZILIAN                             8     0
##   WHITE - EASTERN EUROPEAN                     17     4
##   WHITE - OTHER EUROPEAN                      149    16
##   WHITE - RUSSIAN                              76    26
## Warning in chisq.test(stu_data): Chi-squared approximation may be incorrect
## 
##  Pearson's Chi-squared test
## 
## data:  stu_data
## X-squared = 78.166, df = 32, p-value = 9.777e-06

Mortality vs GCS Eye Opening

## [1] "Chi-Squqre correkation for  GCS...Eye.Opening  vs  mortality"
##    Y
## X   Alive Death
##   1     6    66
##   2     5    42
##   3    17    36
##   4  5380   825
## 
##  Pearson's Chi-squared test
## 
## data:  stu_data
## X-squared = 659.09, df = 3, p-value < 2.2e-16

Mortality vs Age Group

## [1] "Chi-Squqre correkation for  Age.Group  vs  mortality"
##         Y
## X        Alive Death
##   19-35     90     9
##   36-50    382    46
##   51-65   1580   183
##   66-100  3356   731
## 
##  Pearson's Chi-squared test
## 
## data:  stu_data
## X-squared = 64.117, df = 3, p-value = 7.749e-14

Gender vs Ethnicity

## [1] "Chi-Squqre correkation for  ethnicity  vs  gender"
##                                            Y
## X                                              F    M
##   AMERICAN INDIAN/ALASKA NATIVE                5    5
##   ASIAN                                       17   34
##   ASIAN - ASIAN INDIAN                         3   14
##   ASIAN - CHINESE                             29   35
##   ASIAN - KOREAN                               1    2
##   ASIAN - SOUTH EAST ASIAN                     6   10
##   BLACK/AFRICAN                                9    8
##   BLACK/AFRICAN AMERICAN                     220  208
##   BLACK/CAPE VERDEAN                           8   14
##   BLACK/CARIBBEAN ISLAND                      18   12
##   HISPANIC OR LATINO                           7   12
##   HISPANIC/LATINO - CENTRAL AMERICAN           0    1
##   HISPANIC/LATINO - COLUMBIAN                  4    3
##   HISPANIC/LATINO - CUBAN                      1    3
##   HISPANIC/LATINO - DOMINICAN                 19   21
##   HISPANIC/LATINO - GUATEMALAN                 2    4
##   HISPANIC/LATINO - HONDURAN                   2    5
##   HISPANIC/LATINO - MEXICAN                    1    1
##   HISPANIC/LATINO - PUERTO RICAN              23   54
##   HISPANIC/LATINO - SALVADORAN                 0    2
##   MULTIPLE RACE/ETHNICITY                      0    1
##   NATIVE HAWAIIAN OR OTHER PACIFIC ISLANDER    5    5
##   OTHER                                       66  114
##   PATIENT DECLINED TO ANSWER                   9   30
##   PORTUGUESE                                   5   12
##   SOUTH AMERICAN                               2    3
##   UNABLE TO OBTAIN                            28   22
##   UNKNOWN                                    229  464
##   WHITE                                     1566 2697
##   WHITE - BRAZILIAN                            2    6
##   WHITE - EASTERN EUROPEAN                    13    8
##   WHITE - OTHER EUROPEAN                      59  106
##   WHITE - RUSSIAN                             37   65
## Warning in chisq.test(stu_data): Chi-squared approximation may be incorrect
## 
##  Pearson's Chi-squared test
## 
## data:  stu_data
## X-squared = 81.94, df = 32, p-value = 2.93e-06

Gender vs GCS Eye Opening

## [1] "Chi-Squqre correkation for  GCS...Eye.Opening  vs  gender"
##    Y
## X      F    M
##   1   26   46
##   2   21   26
##   3   30   23
##   4 2319 3886
## 
##  Pearson's Chi-squared test
## 
## data:  stu_data
## X-squared = 9.3672, df = 3, p-value = 0.02479

Gender vs Age Group

## [1] "Chi-Squqre correkation for  Age.Group  vs  gender"
##         Y
## X           F    M
##   19-35    36   63
##   36-50   130  298
##   51-65   531 1232
##   66-100 1699 2388
## 
##  Pearson's Chi-squared test
## 
## data:  stu_data
## X-squared = 79.129, df = 3, p-value < 2.2e-16

GCS Eye Opening vs Ethnicity

## [1] "Chi-Squqre correkation for  ethnicity  vs  GCS...Eye.Opening"
##                                            Y
## X                                              1    2    3    4
##   AMERICAN INDIAN/ALASKA NATIVE                0    0    0   10
##   ASIAN                                        3    0    1   47
##   ASIAN - ASIAN INDIAN                         0    0    0   17
##   ASIAN - CHINESE                              1    0    1   62
##   ASIAN - KOREAN                               0    0    0    3
##   ASIAN - SOUTH EAST ASIAN                     0    1    0   15
##   BLACK/AFRICAN                                0    0    0   17
##   BLACK/AFRICAN AMERICAN                       3    4    3  418
##   BLACK/CAPE VERDEAN                           1    0    0   21
##   BLACK/CARIBBEAN ISLAND                       1    0    1   28
##   HISPANIC OR LATINO                           0    1    0   18
##   HISPANIC/LATINO - CENTRAL AMERICAN           0    0    0    1
##   HISPANIC/LATINO - COLUMBIAN                  0    0    0    7
##   HISPANIC/LATINO - CUBAN                      0    0    0    4
##   HISPANIC/LATINO - DOMINICAN                  0    0    0   40
##   HISPANIC/LATINO - GUATEMALAN                 0    0    0    6
##   HISPANIC/LATINO - HONDURAN                   0    0    0    7
##   HISPANIC/LATINO - MEXICAN                    0    0    0    2
##   HISPANIC/LATINO - PUERTO RICAN               3    0    0   74
##   HISPANIC/LATINO - SALVADORAN                 0    0    0    2
##   MULTIPLE RACE/ETHNICITY                      0    0    0    1
##   NATIVE HAWAIIAN OR OTHER PACIFIC ISLANDER    0    0    0   10
##   OTHER                                        3    1    3  173
##   PATIENT DECLINED TO ANSWER                   0    0    0   39
##   PORTUGUESE                                   0    0    0   17
##   SOUTH AMERICAN                               0    0    0    5
##   UNABLE TO OBTAIN                             0    1    1   48
##   UNKNOWN                                     27   16   11  639
##   WHITE                                       27   22   30 4184
##   WHITE - BRAZILIAN                            0    0    0    8
##   WHITE - EASTERN EUROPEAN                     0    0    0   21
##   WHITE - OTHER EUROPEAN                       1    0    2  162
##   WHITE - RUSSIAN                              2    1    0   99
## Warning in chisq.test(stu_data): Chi-squared approximation may be incorrect
## 
##  Pearson's Chi-squared test
## 
## data:  stu_data
## X-squared = 143.13, df = 96, p-value = 0.001304

Age Group vs Ethnicity

## [1] "Chi-Squqre correkation for  ethnicity  vs  Age.Group"
##                                            Y
## X                                           19-35 36-50 51-65 66-100
##   AMERICAN INDIAN/ALASKA NATIVE                 0     0     3      7
##   ASIAN                                         2     4    14     31
##   ASIAN - ASIAN INDIAN                          0     1     6     10
##   ASIAN - CHINESE                               0     5    14     45
##   ASIAN - KOREAN                                0     0     0      3
##   ASIAN - SOUTH EAST ASIAN                      1     1     4     10
##   BLACK/AFRICAN                                 1     3     1     12
##   BLACK/AFRICAN AMERICAN                       20    41   135    232
##   BLACK/CAPE VERDEAN                            1     1     6     14
##   BLACK/CARIBBEAN ISLAND                        2     2    10     16
##   HISPANIC OR LATINO                            0     0    10      9
##   HISPANIC/LATINO - CENTRAL AMERICAN            0     0     0      1
##   HISPANIC/LATINO - COLUMBIAN                   0     0     1      6
##   HISPANIC/LATINO - CUBAN                       0     0     1      3
##   HISPANIC/LATINO - DOMINICAN                   3     4    15     18
##   HISPANIC/LATINO - GUATEMALAN                  0     2     3      1
##   HISPANIC/LATINO - HONDURAN                    0     0     4      3
##   HISPANIC/LATINO - MEXICAN                     0     0     1      1
##   HISPANIC/LATINO - PUERTO RICAN                2     9    34     32
##   HISPANIC/LATINO - SALVADORAN                  0     0     1      1
##   MULTIPLE RACE/ETHNICITY                       1     0     0      0
##   NATIVE HAWAIIAN OR OTHER PACIFIC ISLANDER     0     1     2      7
##   OTHER                                         5    16    58    101
##   PATIENT DECLINED TO ANSWER                    0     5    13     21
##   PORTUGUESE                                    0     1     3     13
##   SOUTH AMERICAN                                0     0     1      4
##   UNABLE TO OBTAIN                              0     5    13     32
##   UNKNOWN                                      12    43   173    465
##   WHITE                                        44   272  1164   2783
##   WHITE - BRAZILIAN                             0     2     4      2
##   WHITE - EASTERN EUROPEAN                      0     2     5     14
##   WHITE - OTHER EUROPEAN                        4     8    49    104
##   WHITE - RUSSIAN                               1     0    15     86
## Warning in chisq.test(stu_data): Chi-squared approximation may be incorrect
## 
##  Pearson's Chi-squared test
## 
## data:  stu_data
## X-squared = 240.14, df = 96, p-value = 2.441e-14

Age Group vs GCS Eye Opening

## [1] "Chi-Squqre correkation for  ethnicity  vs  Age.Group"
##    Y
## X   19-35 36-50 51-65 66-100
##   1     0     7    15     50
##   2     0     2    11     34
##   3     2     2    10     39
##   4    97   417  1727   3964
## Warning in chisq.test(stu_data): Chi-squared approximation may be incorrect
## 
##  Pearson's Chi-squared test
## 
## data:  stu_data
## X-squared = 10.291, df = 9, p-value = 0.3274

Cleaning Dataset

Removing extreme Out-of-range values

In this part we will be taking a careful look to the ranges and limits for each variable, we will group them by the nature of the variables and their clinical significance.

ID Group

##    subject_id       gender        age        mortality   
##  Min.   :10002430   F:2396   Min.   :19.00   Alive:5408  
##  1st Qu.:12494493   M:3981   1st Qu.:61.00   Death: 969  
##  Median :14959313            Median :70.00               
##  Mean   :14975796            Mean   :69.31               
##  3rd Qu.:17439137            3rd Qu.:79.00               
##  Max.   :19997293            Max.   :91.00               
##                                                          
##                   ethnicity   
##  WHITE                 :4263  
##  UNKNOWN               : 693  
##  BLACK/AFRICAN AMERICAN: 428  
##  OTHER                 : 180  
##  WHITE - OTHER EUROPEAN: 165  
##  WHITE - RUSSIAN       : 102  
##  (Other)               : 546

Comments: In this identification part, it seems to be good and clean, it will serve further to split the data and make other kind of analysis.

Vital Signs Group

Heart Rate Group

##    Heart.Rate    Heart.rate.Alarm...High Heart.Rate.Alarm...Low
##  Min.   : 43.0   Min.   :    60.0        Min.   :   40.0       
##  1st Qu.: 95.0   1st Qu.:   120.0        1st Qu.:   50.0       
##  Median :109.0   Median :   130.0        Median :   60.0       
##  Mean   :113.1   Mean   :   260.8        Mean   :  168.8       
##  3rd Qu.:128.0   3rd Qu.:   133.8        3rd Qu.:   60.0       
##  Max.   :295.0   Max.   :165130.0        Max.   :60120.0       
##                  NA's   :3               NA's   :4

Now, we’ll be checking the number of values out-of-range among Heart Rate

## [1] "Out-of-range values Heart Rate Group"
## Heart Rate:  6
## Heart Rate Alarm High:  65
## Heart Rate Alarm Low:  27

In this part, we will erase all values above the normal heart rate.

## Summary of cleaned Heart Rate Group
##    Heart.Rate  Heart.rate.Alarm...High Heart.Rate.Alarm...Low
##  Min.   : 43   Min.   : 60.0           Min.   : 40.00        
##  1st Qu.: 95   1st Qu.:120.0           1st Qu.: 50.00        
##  Median :109   Median :130.0           Median : 60.00        
##  Mean   :113   Mean   :131.1           Mean   : 60.06        
##  3rd Qu.:128   3rd Qu.:130.0           3rd Qu.: 60.00        
##  Max.   :242   Max.   :250.0           Max.   :180.00        
##  NA's   :6     NA's   :68              NA's   :31

Comments:

References: https://www.heart.org/en/healthy-living/fitness/fitness-basics/target-heart-rates https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6220689/

Blood Pressure Group

##  Arterial.Blood.Pressure.systolic Non.Invasive.Blood.Pressure.systolic
##  Min.   :  0.0                    Min.   :   56.0                     
##  1st Qu.:136.0                    1st Qu.:  135.0                     
##  Median :150.0                    Median :  153.0                     
##  Mean   :154.5                    Mean   :  160.8                     
##  3rd Qu.:168.0                    3rd Qu.:  171.0                     
##  Max.   :742.0                    Max.   :15878.0                     
##  NA's   :2478                     NA's   :45                          
##  Arterial.Blood.Pressure.diastolic Non.Invasive.Blood.Pressure.diastolic
##  Min.   :    1.0                   Min.   :    41.0                     
##  1st Qu.:   69.0                   1st Qu.:    82.0                     
##  Median :   78.0                   Median :    98.0                     
##  Mean   :  174.7                   Mean   :   215.9                     
##  3rd Qu.:   92.0                   3rd Qu.:   115.0                     
##  Max.   :91100.0                   Max.   :105125.0                     
##  NA's   :2477                      NA's   :45

Now, we’ll be checking the number of values out-of-range among Blood Pressure

## [1] "Out-of-range values Blood Pressure Group"
## Arterial BP syst:  1
## Non Inv BP syst:  4
## Arterial BP dias:  32
## Non Inv BP dias:  34

Comments: The limits for each variable were taken graphically analyzing the scatter plot, we use extreme values to keep the outliers that could potentially contain valuable information about patient condition.

## Summary of cleaned Blood Pressure Group
##  Arterial.Blood.Pressure.systolic Non.Invasive.Blood.Pressure.systolic
##  Min.   :  0.0                    Min.   : 56.0                       
##  1st Qu.:136.0                    1st Qu.:135.0                       
##  Median :149.5                    Median :153.0                       
##  Mean   :154.4                    Mean   :154.3                       
##  3rd Qu.:167.8                    3rd Qu.:171.0                       
##  Max.   :357.0                    Max.   :321.0                       
##  NA's   :2479                     NA's   :49                          
##  Arterial.Blood.Pressure.diastolic Non.Invasive.Blood.Pressure.diastolic
##  Min.   :  1.00                    Min.   : 41.0                        
##  1st Qu.: 69.00                    1st Qu.: 82.0                        
##  Median : 78.00                    Median : 97.5                        
##  Mean   : 83.43                    Mean   :100.0                        
##  3rd Qu.: 92.00                    3rd Qu.:115.0                        
##  Max.   :259.00                    Max.   :230.0                        
##  NA's   :2509                      NA's   :79

References: https://www.nursingcenter.com/ncblog/may-2022/non-invasive-blood-pressure#:~:text=Normal%20blood%20pressure%20is%20considered,can%20lead%20to%20inaccurate%20readings.

Respiratory Rate Group

##  Respiratory.Rate  Respiratory.Rate..Set. Respiratory.Rate..spontaneous.
##  Min.   :  15.00   Min.   :   0.00        Min.   :   0.00               
##  1st Qu.:  27.00   1st Qu.:  16.00        1st Qu.:  13.00               
##  Median :  32.00   Median :  18.00        Median :  21.00               
##  Mean   :  34.78   Mean   :  20.02        Mean   :  20.81               
##  3rd Qu.:  38.00   3rd Qu.:  20.00        3rd Qu.:  28.00               
##  Max.   :2037.00   Max.   :1618.00        Max.   :1918.00               
##  NA's   :2         NA's   :2470           NA's   :2300                  
##  Respiratory.Rate..Total. SpO2.Desat.Limit
##  Min.   :   0.00          Min.   : 85.00  
##  1st Qu.:  18.00          1st Qu.: 85.00  
##  Median :  23.00          Median : 88.00  
##  Mean   :  25.74          Mean   : 89.38  
##  3rd Qu.:  29.00          3rd Qu.: 88.00  
##  Max.   :3634.00          Max.   :920.00  
##  NA's   :2286             NA's   :13

Now, we’ll be checking the number of values out-of-range among Respiratory Rate

## [1] "Out-of-range values Respiration Group"
## Respiratory rate:  25
## Respiratory Rate (Set):  11
## Respiratory Rate (spontaneous):  11
## Respiratory Rate (Total):  7
## SpO2 Desat Limit:  18

Comments: - A respiratory rate of 120 breaths per minute (bpm) would be extremely high and generally not sustainable for an extended period in a resting adult. Such a high respiratory rate would likely indicate severe respiratory distress, significant metabolic demand, or a medical emergency. While it’s theoretically possible for a person to briefly reach such a high respiratory rate, it would be highly abnormal and would warrant immediate medical attention. by ChatGPT

  • For the SpO2 Desat Limit, we set the limit in 100, this make reference to a oxygen saturation of the 100%.
## Summary of cleaned Respiration Group
##  Respiratory.Rate Respiratory.Rate..Set. Respiratory.Rate..spontaneous.
##  Min.   : 15.00   Min.   : 0.00          Min.   :  0.00                
##  1st Qu.: 27.00   1st Qu.:16.00          1st Qu.: 13.00                
##  Median : 32.00   Median :18.00          Median : 21.00                
##  Mean   : 33.54   Mean   :18.55          Mean   : 19.71                
##  3rd Qu.: 37.00   3rd Qu.:20.00          3rd Qu.: 28.00                
##  Max.   :120.00   Max.   :40.00          Max.   :101.00                
##  NA's   :27       NA's   :2481           NA's   :2311                  
##  Respiratory.Rate..Total. SpO2.Desat.Limit
##  Min.   :  0.00           Min.   : 85.00  
##  1st Qu.: 18.00           1st Qu.: 85.00  
##  Median : 23.00           Median : 88.00  
##  Mean   : 24.55           Mean   : 87.32  
##  3rd Qu.: 29.00           3rd Qu.: 88.00  
##  Max.   :118.00           Max.   :100.00  
##  NA's   :2293             NA's   :31

References: https://www.whoop.com/us/en/thelocker/what-causes-an-increased-respiratory-rate/

Blood Chemistry

Blood Clotting Group

##       INR           Prothrombin.time  
##  Min.   :     0.9   Min.   :     9.3  
##  1st Qu.:     1.2   1st Qu.:    13.5  
##  Median :     1.4   Median :    15.6  
##  Mean   :  5049.9   Mean   :  4905.8  
##  3rd Qu.:     1.8   3rd Qu.:    19.9  
##  Max.   :999999.0   Max.   :999999.0  
##  NA's   :236        NA's   :236

Now, we’ll be checking the number of values out-of-range among Blood Clotting

## [1] "Out-of-range values Blood Clotting Group"
## Error Value (999999) INR PT:  61
## INR:  716
## Prothrombin time:  1432

Comments: - For both variables we found a strange value ‘999999’, this values maybe is due a misreading or malfunction of the device, later on we can decide what to do with it. - For the INR I used a value found in the reference as maximum range, however, that values is for patient in certain treatment, we should research a little bit more if we found some other references. - For the Prothrombin time, didn’t found valuable resources or values, I let the 3rd quartile as reference just to see the outcome.

##       INR         Prothrombin.time
##  Min.   : 0.900   Min.   :  9.30  
##  1st Qu.: 1.200   1st Qu.: 13.50  
##  Median : 1.400   Median : 15.60  
##  Mean   : 1.923   Mean   : 20.71  
##  3rd Qu.: 1.800   3rd Qu.: 19.70  
##  Max.   :27.400   Max.   :150.00  
##  NA's   :267      NA's   :266

Comments:

Reference: https://www.mayoclinic.org/tests-procedures/prothrombin-time/about/pac-20384661#:~:text=The%20average%20time%20range%20for,clots%20more%20quickly%20than%20normal.

Electrolytes and Acid-Base Balance Group

##    Anion.gap        Potassium..Whole.Blood.2 Potassium..whole.blood.
##  Min.   :     7.0   Min.   :  1.800          Min.   :     2.1       
##  1st Qu.:    14.0   1st Qu.:  4.400          1st Qu.:     4.3       
##  Median :    16.0   Median :  5.000          Median :     4.9       
##  Mean   :   174.5   Mean   :  5.124          Mean   :  1403.6       
##  3rd Qu.:    20.0   3rd Qu.:  5.600          3rd Qu.:     5.5       
##  Max.   :999999.0   Max.   :134.000          Max.   :999999.0       
##  NA's   :17         NA's   :1441             NA's   :2802           
##  Sodium..whole.blood. Sodium..Whole.Blood Chloride..Whole.Blood
##  Min.   :   118.0     Min.   :115.0       Min.   : 71.0        
##  1st Qu.:   135.0     1st Qu.:136.0       1st Qu.:103.0        
##  Median :   137.0     Median :138.0       Median :106.0        
##  Mean   :   800.3     Mean   :138.3       Mean   :106.1        
##  3rd Qu.:   139.0     3rd Qu.:140.0       3rd Qu.:109.0        
##  Max.   :999999.0     Max.   :187.0       Max.   :139.0        
##  NA's   :3362         NA's   :2121        NA's   :2267         
##  Chloride..whole.blood.  Bicarbonate  
##  Min.   :    11.0       Min.   :13.0  
##  1st Qu.:   104.0       1st Qu.:28.0  
##  Median :   107.0       Median :30.0  
##  Mean   :   462.7       Mean   :30.7  
##  3rd Qu.:   109.0       3rd Qu.:33.0  
##  Max.   :999999.0       Max.   :51.0  
##  NA's   :3569           NA's   :3

Now, we’ll be checking the number of values out-of-range among “x1”

## [1] "Out-of-range values Electrolytes and Acid-Base Balance Group"
## Error Value (999999) in: Anion.gap = 1 
## Error Value (999999) in: Potassium..Whole.Blood.2 = 0 
## Error Value (999999) in: Potassium..whole.blood. = 5 
## Error Value (999999) in: Sodium..whole.blood. = 2 
## Error Value (999999) in: Sodium..Whole.Blood = 0 
## Error Value (999999) in: Chloride..Whole.Blood = 0 
## Error Value (999999) in: Chloride..whole.blood. = 1 
## Error Value (999999) in: Bicarbonate = 0

Comments:

##    Anion.gap      Potassium..Whole.Blood.2 Potassium..whole.blood.
##  Min.   :  7.00   Min.   :  1.800          Min.   :  2.100        
##  1st Qu.: 14.00   1st Qu.:  4.400          1st Qu.:  4.300        
##  Median : 16.00   Median :  5.000          Median :  4.900        
##  Mean   : 17.31   Mean   :  5.124          Mean   :  5.006        
##  3rd Qu.: 20.00   3rd Qu.:  5.600          3rd Qu.:  5.500        
##  Max.   :157.00   Max.   :134.000          Max.   :134.000        
##  NA's   :18       NA's   :1441             NA's   :2807           
##  Sodium..whole.blood. Sodium..Whole.Blood Chloride..Whole.Blood
##  Min.   :118.0        Min.   :115.0       Min.   : 71.0        
##  1st Qu.:135.0        1st Qu.:136.0       1st Qu.:103.0        
##  Median :137.0        Median :138.0       Median :106.0        
##  Mean   :137.1        Mean   :138.3       Mean   :106.1        
##  3rd Qu.:139.0        3rd Qu.:140.0       3rd Qu.:109.0        
##  Max.   :187.0        Max.   :187.0       Max.   :139.0        
##  NA's   :3364         NA's   :2121        NA's   :2267         
##  Chloride..whole.blood.  Bicarbonate  
##  Min.   : 11.0          Min.   :13.0  
##  1st Qu.:104.0          1st Qu.:28.0  
##  Median :107.0          Median :30.0  
##  Mean   :106.6          Mean   :30.7  
##  3rd Qu.:109.0          3rd Qu.:33.0  
##  Max.   :141.0          Max.   :51.0  
##  NA's   :3570           NA's   :3

## [1] "Out-of-range values Electrolytes and Acid-Base Balance Group"
## Anion Gap:  1
## Potassium (Whole Blood 1):  1
## Potassium (Whole Blood 2):  1
## Chloride (Whole Blood 2):  1

Comments:

##    Anion.gap     Potassium..Whole.Blood.2 Potassium..whole.blood.
##  Min.   : 7.00   Min.   : 1.800           Min.   : 2.10          
##  1st Qu.:14.00   1st Qu.: 4.400           1st Qu.: 4.30          
##  Median :16.00   Median : 5.000           Median : 4.90          
##  Mean   :17.29   Mean   : 5.098           Mean   : 4.97          
##  3rd Qu.:20.00   3rd Qu.: 5.600           3rd Qu.: 5.50          
##  Max.   :56.00   Max.   :33.000           Max.   :33.00          
##  NA's   :19      NA's   :1442             NA's   :2808           
##  Sodium..whole.blood. Sodium..Whole.Blood Chloride..Whole.Blood
##  Min.   :118.0        Min.   :115.0       Min.   : 71.0        
##  1st Qu.:135.0        1st Qu.:136.0       1st Qu.:103.0        
##  Median :137.0        Median :138.0       Median :106.0        
##  Mean   :137.1        Mean   :138.3       Mean   :106.1        
##  3rd Qu.:139.0        3rd Qu.:140.0       3rd Qu.:109.0        
##  Max.   :187.0        Max.   :187.0       Max.   :139.0        
##  NA's   :3364         NA's   :2121        NA's   :2267         
##  Chloride..whole.blood.  Bicarbonate  
##  Min.   : 71.0          Min.   :13.0  
##  1st Qu.:104.0          1st Qu.:28.0  
##  Median :107.0          Median :30.0  
##  Mean   :106.7          Mean   :30.7  
##  3rd Qu.:109.0          3rd Qu.:33.0  
##  Max.   :141.0          Max.   :51.0  
##  NA's   :3571           NA's   :3

Comments:

Metabolic Parameters

##  Creatinine..serum.  Temperature    Glucose..whole.blood.
##  Min.   :     0.3   Min.   :32.20   Min.   :     35      
##  1st Qu.:     0.9   1st Qu.:36.80   1st Qu.:    147      
##  Median :     1.2   Median :37.20   Median :    173      
##  Mean   :   944.9   Mean   :37.39   Mean   :   2312      
##  3rd Qu.:     2.1   3rd Qu.:37.90   3rd Qu.:    210      
##  Max.   :999999.0   Max.   :40.70   Max.   :1276100      
##  NA's   :14         NA's   :3840    NA's   :2953
## [1] "Out-of-range values for Creatinine (serum), Glucose (Whole Blood), and Temperature"
## Error Value (999999) Creatinine Serum:  6
## Creatinine (serum):  7
## Glucose (Whole Blood):  7
##  Creatinine..serum.  Temperature    Glucose..whole.blood.
##  Min.   : 0.300     Min.   :32.20   Min.   :  35.0       
##  1st Qu.: 0.900     1st Qu.:36.80   1st Qu.: 147.0       
##  Median : 1.200     Median :37.20   Median : 173.0       
##  Mean   : 1.969     Mean   :37.39   Mean   : 187.2       
##  3rd Qu.: 2.100     3rd Qu.:37.90   3rd Qu.: 210.0       
##  Max.   :23.000     Max.   :40.70   Max.   :1183.0       
##  NA's   :21         NA's   :3840    NA's   :2960

Comments:

Hematology Group

##    Hemoglobin     Hemoglobin.2     Hematocrit    Platelet.Count  
##  Min.   : 0.00   Min.   : 5.10   Min.   :18.10   Min.   :   9.0  
##  1st Qu.:10.70   1st Qu.:12.30   1st Qu.:37.60   1st Qu.: 231.0  
##  Median :12.10   Median :13.60   Median :41.40   Median : 304.0  
##  Mean   :12.02   Mean   :13.52   Mean   :41.14   Mean   : 339.9  
##  3rd Qu.:13.40   3rd Qu.:14.80   3rd Qu.:44.60   3rd Qu.: 408.0  
##  Max.   :97.00   Max.   :22.60   Max.   :69.70   Max.   :2660.0  
##  NA's   :2357
## [1] "Out-of-range values for Hemoglobin, Hemoglobin.2, Hematocrit, and Platelet.Count"
## Hemoglobin:  2
## Platelet Count:  11
## Summary of cleaned variables
##    Hemoglobin     Hemoglobin.2     Hematocrit    Platelet.Count  
##  Min.   : 0.00   Min.   : 5.10   Min.   :18.10   Min.   :   9.0  
##  1st Qu.:10.70   1st Qu.:12.30   1st Qu.:37.60   1st Qu.: 231.0  
##  Median :12.10   Median :13.60   Median :41.40   Median : 304.0  
##  Mean   :11.98   Mean   :13.52   Mean   :41.14   Mean   : 337.3  
##  3rd Qu.:13.40   3rd Qu.:14.80   3rd Qu.:44.60   3rd Qu.: 407.0  
##  Max.   :19.40   Max.   :22.60   Max.   :69.70   Max.   :1328.0  
##  NA's   :2359                                    NA's   :11

Comments:

References: https://www.nhlbi.nih.gov/health/thrombocytopenia#:~:text=A%20normal%20platelet%20count%20in,microliter%20is%20lower%20than%20normal.

Checking missing observations again

## The number of total missing values after ranges cleaning is: 37683
## The percentage of mortality is: 0.152 --> In the dataset with ranges cleaned.

Correlation for Numerical variables after ranges cleaned

Binary variable aggregation

##   [1] "subject_id"                                                       
##   [2] "gender"                                                           
##   [3] "age"                                                              
##   [4] "mortality"                                                        
##   [5] "ethnicity"                                                        
##   [6] "Heart.Rate"                                                       
##   [7] "Heart.rate.Alarm...High"                                          
##   [8] "Heart.Rate.Alarm...Low"                                           
##   [9] "Arterial.Blood.Pressure.systolic"                                 
##  [10] "Non.Invasive.Blood.Pressure.systolic"                             
##  [11] "Arterial.Blood.Pressure.diastolic"                                
##  [12] "Non.Invasive.Blood.Pressure.diastolic"                            
##  [13] "Respiratory.Rate"                                                 
##  [14] "Respiratory.Rate..Set."                                           
##  [15] "Respiratory.Rate..spontaneous."                                   
##  [16] "Respiratory.Rate..Total."                                         
##  [17] "SpO2.Desat.Limit"                                                 
##  [18] "INR"                                                              
##  [19] "Prothrombin.time"                                                 
##  [20] "Anion.gap"                                                        
##  [21] "Creatinine..serum."                                               
##  [22] "Temperature"                                                      
##  [23] "Potassium..Whole.Blood.2"                                         
##  [24] "Potassium..whole.blood."                                          
##  [25] "Sodium..whole.blood."                                             
##  [26] "Sodium..Whole.Blood"                                              
##  [27] "Chloride..Whole.Blood"                                            
##  [28] "Chloride..whole.blood."                                           
##  [29] "Bicarbonate"                                                      
##  [30] "Glucose..whole.blood."                                            
##  [31] "GCS...Eye.Opening"                                                
##  [32] "Hemoglobin"                                                       
##  [33] "Hemoglobin.2"                                                     
##  [34] "Hematocrit"                                                       
##  [35] "Platelet.Count"                                                   
##  [36] "Acute.myocardial.infarction.of.anterolateral.wall..episode.of.c"  
##  [37] "Acute.myocardial.infarction.of.anterolateral.wall..initial.epis"  
##  [38] "Acute.myocardial.infarction.of.anterolateral.wall..subsequent.e"  
##  [39] "Acute.myocardial.infarction.of.other.anterior.wall..episode.of."  
##  [40] "Acute.myocardial.infarction.of.other.anterior.wall..initial.epi"  
##  [41] "Acute.myocardial.infarction.of.other.anterior.wall..subsequent."  
##  [42] "Acute.myocardial.infarction.of.inferolateral.wall..episode.of.c"  
##  [43] "Acute.myocardial.infarction.of.inferolateral.wall..initial.epis"  
##  [44] "Acute.myocardial.infarction.of.inferolateral.wall..subsequent.e"  
##  [45] "Acute.myocardial.infarction.of.inferoposterior.wall..episode.of"  
##  [46] "Acute.myocardial.infarction.of.inferoposterior.wall..initial.ep"  
##  [47] "Acute.myocardial.infarction.of.inferoposterior.wall..subsequent"  
##  [48] "Acute.myocardial.infarction.of.other.inferior.wall..episode.of."  
##  [49] "Acute.myocardial.infarction.of.other.inferior.wall..initial.epi"  
##  [50] "Acute.myocardial.infarction.of.other.inferior.wall..subsequent."  
##  [51] "Acute.myocardial.infarction.of.other.lateral.wall..episode.of.c"  
##  [52] "Acute.myocardial.infarction.of.other.lateral.wall..initial.epis"  
##  [53] "Acute.myocardial.infarction.of.other.lateral.wall..subsequent.e"  
##  [54] "Acute.myocardial.infarction.of.other.specified.sites..episode.o"  
##  [55] "Acute.myocardial.infarction.of.other.specified.sites..initial.e"  
##  [56] "Acute.myocardial.infarction.of.other.specified.sites..subsequen"  
##  [57] "Acute.myocardial.infarction.of.unspecified.site..episode.of.car"  
##  [58] "Acute.myocardial.infarction.of.unspecified.site..initial.episod"  
##  [59] "Acute.myocardial.infarction.of.unspecified.site..subsequent.epi"  
##  [60] "Postmyocardial.infarction.syndrome"                               
##  [61] "Acute.coronary.occlusion.without.myocardial.infarction"           
##  [62] "Old.myocardial.infarction"                                        
##  [63] "Certain.sequelae.of.myocardial.infarction..not.elsewhere.classi"  
##  [64] "Acute.myocardial.infarction"                                      
##  [65] "ST.elevation..STEMI..myocardial.infarction.of.anterior.wall"      
##  [66] "ST.elevation..STEMI..myocardial.infarction.involving.left.main"   
##  [67] "ST.elevation..STEMI..myocardial.infarction.involving.left.anter"  
##  [68] "ST.elevation..STEMI..myocardial.infarction.involving.other.coro"  
##  [69] "ST.elevation..STEMI..myocardial.infarction.of.inferior.wall"      
##  [70] "ST.elevation..STEMI..myocardial.infarction.involving.right.coro"  
##  [71] "ST.elevation..STEMI..myocardial.infarction.involving.other.coro.2"
##  [72] "ST.elevation..STEMI..myocardial.infarction.of.other.sites"        
##  [73] "ST.elevation..STEMI..myocardial.infarction.involving.left.circu"  
##  [74] "ST.elevation..STEMI..myocardial.infarction.involving.other.site"  
##  [75] "ST.elevation..STEMI..myocardial.infarction.of.unspecified.site"   
##  [76] "Non.ST.elevation..NSTEMI..myocardial.infarction"                  
##  [77] "Acute.myocardial.infarction..unspecified"                         
##  [78] "Other.type.of.myocardial.infarction"                              
##  [79] "Myocardial.infarction.type.2"                                     
##  [80] "Other.myocardial.infarction.type"                                 
##  [81] "Subsequent.ST.elevation..STEMI..and.non.ST.elevation..NSTEMI..m"  
##  [82] "Subsequent.ST.elevation..STEMI..myocardial.infarction.of.anteri"  
##  [83] "Subsequent.ST.elevation..STEMI..myocardial.infarction.of.inferi"  
##  [84] "Subsequent.non.ST.elevation..NSTEMI..myocardial.infarction"       
##  [85] "Subsequent.ST.elevation..STEMI..myocardial.infarction.of.other"   
##  [86] "Subsequent.ST.elevation..STEMI..myocardial.infarction.of.unspec"  
##  [87] "Certain.current.complications.following.ST.elevation..STEMI..an"  
##  [88] "Hemopericardium.as.current.complication.following.acute.myocard"  
##  [89] "Atrial.septal.defect.as.current.complication.following.acute.my"  
##  [90] "Ventricular.septal.defect.as.current.complication.following.acu"  
##  [91] "Rupture.of.cardiac.wall.without.hemopericardium.as.current.comp"  
##  [92] "Rupture.of.chordae.tendineae.as.current.complication.following"   
##  [93] "Rupture.of.papillary.muscle.as.current.complication.following.a"  
##  [94] "Thrombosis.of.atrium..auricular.appendage..and.ventricle.as.cur"  
##  [95] "Other.current.complications.following.acute.myocardial.infarcti"  
##  [96] "Acute.coronary.thrombosis.not.resulting.in.myocardial.infarctio"  
##  [97] "Old.myocardial.infarction.2"                                      
##  [98] "Rheumatic.heart.failure..congestive."                             
##  [99] "Congestive.heart.failure..unspecified"                            
## [100] "Systolic..congestive..heart.failure"                              
## [101] "Unspecified.systolic..congestive..heart.failure"                  
## [102] "Acute.systolic..congestive..heart.failure"                        
## [103] "Chronic.systolic..congestive..heart.failure"                      
## [104] "Acute.on.chronic.systolic..congestive..heart.failure"             
## [105] "Diastolic..congestive..heart.failure"                             
## [106] "Unspecified.diastolic..congestive..heart.failure"                 
## [107] "Acute.diastolic..congestive..heart.failure"                       
## [108] "Chronic.diastolic..congestive..heart.failure"                     
## [109] "Acute.on.chronic.diastolic..congestive..heart.failure"            
## [110] "Combined.systolic..congestive..and.diastolic..congestive..heart"  
## [111] "Unspecified.combined.systolic..congestive..and.diastolic..conge"  
## [112] "Acute.combined.systolic..congestive..and.diastolic..congestive."  
## [113] "Chronic.combined.systolic..congestive..and.diastolic..congestiv"  
## [114] "Acute.on.chronic.combined.systolic..congestive..and.diastolic.."  
## [115] "Atrial.fibrillation"                                              
## [116] "Atrial.fibrillation.and.flutter"                                  
## [117] "Paroxysmal.atrial.fibrillation"                                   
## [118] "Persistent.atrial.fibrillation"                                   
## [119] "Longstanding.persistent.atrial.fibrillation"                      
## [120] "Other.persistent.atrial.fibrillation"                             
## [121] "Chronic.atrial.fibrillation"                                      
## [122] "Chronic.atrial.fibrillation..unspecified"                         
## [123] "Permanent.atrial.fibrillation"                                    
## [124] "Unspecified.atrial.fibrillation.and.atrial.flutter"               
## [125] "Unspecified.atrial.fibrillation"                                  
## [126] "Other.chronic.obstructive.pulmonary.disease"                      
## [127] "Chronic.obstructive.pulmonary.disease.with..acute..lower.respir"  
## [128] "Chronic.obstructive.pulmonary.disease.with..acute..exacerbation"  
## [129] "Chronic.obstructive.pulmonary.disease..unspecified"               
## [130] "Heat.stroke.and.sunstroke"                                        
## [131] "Brain.stem.stroke.syndrome"                                       
## [132] "Cerebellar.stroke.syndrome"                                       
## [133] "National.Institutes.of.Health.Stroke.Scale..NIHSS..score"         
## [134] "Heatstroke.and.sunstroke"                                         
## [135] "Heatstroke.and.sunstroke.2"                                       
## [136] "Heatstroke.and.sunstroke..initial.encounter"                      
## [137] "Heatstroke.and.sunstroke..subsequent.encounter"                   
## [138] "Heatstroke.and.sunstroke..sequela"                                
## [139] "Exertional.heatstroke"                                            
## [140] "Exertional.heatstroke..initial.encounter"                         
## [141] "Exertional.heatstroke..subsequent.encounter"                      
## [142] "Exertional.heatstroke..sequela"                                   
## [143] "Other.heatstroke.and.sunstroke"                                   
## [144] "Other.heatstroke.and.sunstroke..initial.encounter"                
## [145] "Other.heatstroke.and.sunstroke..subsequent.encounter"             
## [146] "Other.heatstroke.and.sunstroke..sequela"                          
## [147] "Heatstroke.and.sunstroke..initial.encounter.2"                    
## [148] "Heatstroke.and.sunstroke..subsequent.encounter.2"                 
## [149] "Heatstroke.and.sunstroke..sequela.2"                              
## [150] "Family.history.of.stroke..cerebrovascular."                       
## [151] "Family.history.of.stroke"                                         
## [152] "Mixed.hyperlipidemia"                                             
## [153] "Other.and.unspecified.hyperlipidemia"                             
## [154] "Mixed.hyperlipidemia.2"                                           
## [155] "Other.hyperlipidemia"                                             
## [156] "Other.hyperlipidemia.2"                                           
## [157] "Hyperlipidemia..unspecified"                                      
## [158] "Other.chronic.obstructive.pulmonary.disease.2"                    
## [159] "Chronic.obstructive.pulmonary.disease.with..acute..lower.respir.2"
## [160] "Chronic.obstructive.pulmonary.disease.with..acute..exacerbation.2"
## [161] "Chronic.obstructive.pulmonary.disease..unspecified.2"             
## [162] "Senile.dementia..uncomplicated"                                   
## [163] "Presenile.dementia..uncomplicated"                                
## [164] "Presenile.dementia.with.delirium"                                 
## [165] "Presenile.dementia.with.delusional.features"                      
## [166] "Presenile.dementia.with.depressive.features"                      
## [167] "Senile.dementia.with.delusional.features"                         
## [168] "Senile.dementia.with.depressive.features"                         
## [169] "Senile.dementia.with.delirium"                                    
## [170] "Vascular.dementia..uncomplicated"                                 
## [171] "Vascular.dementia..with.delirium"                                 
## [172] "Vascular.dementia..with.delusions"                                
## [173] "Vascular.dementia..with.depressed.mood"                           
## [174] "Alcohol.induced.persisting.dementia"                              
## [175] "Drug.induced.persisting.dementia"                                 
## [176] "Dementia.in.conditions.classified.elsewhere.without.behavioral"   
## [177] "Dementia.in.conditions.classified.elsewhere.with.behavioral.dis"  
## [178] "Dementia..unspecified..without.behavioral.disturbance"            
## [179] "Dementia..unspecified..with.behavioral.disturbance"               
## [180] "Other.frontotemporal.dementia"                                    
## [181] "Dementia.with.lewy.bodies"                                        
## [182] "Vascular.dementia"                                                
## [183] "Vascular.dementia.2"                                              
## [184] "Vascular.dementia.without.behavioral.disturbance"                 
## [185] "Vascular.dementia.with.behavioral.disturbance"                    
## [186] "Dementia.in.other.diseases.classified.elsewhere"                  
## [187] "Dementia.in.other.diseases.classified.elsewhere.2"                
## [188] "Dementia.in.other.diseases.classified.elsewhere.without.behavio"  
## [189] "Dementia.in.other.diseases.classified.elsewhere.with.behavioral"  
## [190] "Unspecified.dementia"                                             
## [191] "Unspecified.dementia.2"                                           
## [192] "Unspecified.dementia.without.behavioral.disturbance"              
## [193] "Unspecified.dementia.with.behavioral.disturbance"                 
## [194] "Alcohol.dependence.with.alcohol.induced.persisting.dementia"      
## [195] "Alcohol.use..unspecified.with.alcohol.induced.persisting.dement"  
## [196] "Sedative..hypnotic.or.anxiolytic.dependence.with.sedative..hypn"  
## [197] "Sedative..hypnotic.or.anxiolytic.use..unspecified.with.sedative"  
## [198] "Inhalant.abuse.with.inhalant.induced.dementia"                    
## [199] "Inhalant.dependence.with.inhalant.induced.dementia"               
## [200] "Inhalant.use..unspecified.with.inhalant.induced.persisting.deme"  
## [201] "Other.psychoactive.substance.abuse.with.psychoactive.substance."  
## [202] "Other.psychoactive.substance.dependence.with.psychoactive.subst"  
## [203] "Other.psychoactive.substance.use..unspecified.with.psychoactive"  
## [204] "Frontotemporal.dementia"                                          
## [205] "Other.frontotemporal.dementia.2"                                  
## [206] "Dementia.with.Lewy.bodies"                                        
## [207] "Age.Group"                                                        
## [208] "Myocardial"                                                       
## [209] "Rupture"                                                          
## [210] "Thrombosis"                                                       
## [211] "Systolic"                                                         
## [212] "Diastolic"                                                        
## [213] "Comb_DS"                                                          
## [214] "Fibrillation"                                                     
## [215] "PulmonaryDisease"                                                 
## [216] "Stroke"                                                           
## [217] "Hyperlipidemia"                                                   
## [218] "Dementia"

Now we are going to remove for data ser variables that were sumarized.

## [1] 6377   47
##  [1] "subject_id"                           
##  [2] "gender"                               
##  [3] "age"                                  
##  [4] "mortality"                            
##  [5] "ethnicity"                            
##  [6] "Heart.Rate"                           
##  [7] "Heart.rate.Alarm...High"              
##  [8] "Heart.Rate.Alarm...Low"               
##  [9] "Arterial.Blood.Pressure.systolic"     
## [10] "Non.Invasive.Blood.Pressure.systolic" 
## [11] "Arterial.Blood.Pressure.diastolic"    
## [12] "Non.Invasive.Blood.Pressure.diastolic"
## [13] "Respiratory.Rate"                     
## [14] "Respiratory.Rate..Set."               
## [15] "Respiratory.Rate..spontaneous."       
## [16] "Respiratory.Rate..Total."             
## [17] "SpO2.Desat.Limit"                     
## [18] "INR"                                  
## [19] "Prothrombin.time"                     
## [20] "Anion.gap"                            
## [21] "Creatinine..serum."                   
## [22] "Temperature"                          
## [23] "Potassium..Whole.Blood.2"             
## [24] "Potassium..whole.blood."              
## [25] "Sodium..whole.blood."                 
## [26] "Sodium..Whole.Blood"                  
## [27] "Chloride..Whole.Blood"                
## [28] "Chloride..whole.blood."               
## [29] "Bicarbonate"                          
## [30] "Glucose..whole.blood."                
## [31] "GCS...Eye.Opening"                    
## [32] "Hemoglobin"                           
## [33] "Hemoglobin.2"                         
## [34] "Hematocrit"                           
## [35] "Platelet.Count"                       
## [36] "Age.Group"                            
## [37] "Myocardial"                           
## [38] "Rupture"                              
## [39] "Thrombosis"                           
## [40] "Systolic"                             
## [41] "Diastolic"                            
## [42] "Comb_DS"                              
## [43] "Fibrillation"                         
## [44] "PulmonaryDisease"                     
## [45] "Stroke"                               
## [46] "Hyperlipidemia"                       
## [47] "Dementia"

Combining Columns

Arterial.Blood.Pressure.systolic and Non.Invasive.Blood.Pressure.systolic

  1. Arterial Blood Pressure (ABP) Systolic: obtained using an more invasive and potentially more accurate.
  2. Non-Invasive Blood Pressure (NIBP) Systolic: obtained using a non-invasive method and more convenient but may be less accurate compared to ABP.

With a correlation of approximately 0.08325, the Arterial.Blood.Pressure.systolic and Non.Invasive.Blood.Pressure.systolic variables have a very weak positive correlation. Which means they are not strongly related, and consolidating them might not provide much benefit in terms of improving the data quality. But these variables are to measure systolic blood pressure using different methods, so we are gonna create a new variable, averaging the two variables.

## [1] TRUE

This code calculates the row-wise mean of the two blood pressure variables and creates a new variable AvgBloodPressureSystolic in the dataset.

Arterial.Blood.Pressure.diastolic and Non.Invasive.Blood.Pressure.diastolic

  1. Arterial Blood Pressure (ABP) Diastolic: more invasive but potentially more accurate
  2. Non-Invasive Blood Pressure (NIBP) Diastolic: less invasive but may be less accurate in certain situations.

With a correlation of approximately -0.0009740, the Arterial.Blood.Pressure.diastolic and Non.Invasive.Blood.Pressure.diastolic variables have a very weak negative correlation. This suggests that the two variables are not correlated or are very weakly related. In this case, similar to the systolic blood pressure variables, consolidating these variables might not provide much benefit in terms of improving data quality or reducing redundancy. But these variables are to measure diastolic blood pressure using different methods, so we are gonna create a new variable, averaging the two variables.

## [1] TRUE

This code calculates the row-wise mean of the two blood pressure variables and creates a new variable AvgBloodPressureDiastolic in the dataset.

Combining Respiratory Rate variables

  1. Respiratory.Rate : general measurement of breathes per minute
  2. Respiratory.Rate..Set : fixed or set respiratory rate. During mechanical ventilation
  3. Respiratory.Rate..spontaneous : natural respiratory rate. Without any external intervention to control the breath
  4. Respiratory.Rate..Total : cumulative respiratory rate over a period
Correlation matrix of Respiratory rate
Respiratory.Rate..Set. Respiratory.Rate..spontaneous. Respiratory.Rate..Total. Respiratory.Rate
Respiratory.Rate..Set. 1.0000000 0.2653949 0.5113597 0.3130305
Respiratory.Rate..spontaneous. 0.2653949 1.0000000 0.6906674 0.3578672
Respiratory.Rate..Total. 0.5113597 0.6906674 1.0000000 0.3994472
Respiratory.Rate 0.3130305 0.3578672 0.3994472 1.0000000

Even though they are not strongly correlated, we can still consolidate them into a one variable, because all of these variables are different ways of measuring or recording same respiratory rate.

## [1] TRUE

This code calculates the row-wise mean of the four respiratory rate variables, creating a new variable ConsolidatedRespiratoryRate.

Consolidate and average following duplicated/repeated variables

  1. Potassium..Whole.Blood.2 + Potassium..whole.blood.
  2. Sodium..whole.blood. + Sodium..Whole.Blood
  3. Chloride..Whole.Blood + Chloride..whole.blood.
  4. Hemoglobin + Hemoglobin.2
## [1] TRUE

Remove the columns

Calculate again the missing value percentages

Missing Value Summary After Combining Features
missing_count missing_percentage
Temperature 3840 60.2164027
Glucose..whole.blood. 2960 46.4168104
AvgChloride 2265 35.5182688
AvgSodium 2119 33.2287910
AvgPotassium 1442 22.6125137
INR 267 4.1869218
Prothrombin.time 266 4.1712404
Heart.rate.Alarm…High 68 1.0663321
Heart.Rate.Alarm…Low 31 0.4861220
SpO2.Desat.Limit 31 0.4861220
Creatinine..serum. 21 0.3293085
Anion.gap 19 0.2979457
AvgBloodPressureDiastolic 18 0.2822644
Platelet.Count 11 0.1724949
Heart.Rate 6 0.0940881
AvgBloodPressureSystolic 4 0.0627254
Bicarbonate 3 0.0470441
ConsolidatedRespiratoryRate 2 0.0313627

After combining some variables we can see these differences in the missing value percentages:

Calculating Missingness

## 
## Attaching package: 'kableExtra'
## The following object is masked from 'package:dplyr':
## 
##     group_rows
Before combining variables
Before_Missing_Count
Arterial.Blood.Pressure.systolic 2478
Non.Invasive.Blood.Pressure.systolic 45
Arterial.Blood.Pressure.diastolic 2477
Non.Invasive.Blood.Pressure.diastolic 45
Respiratory.Rate..Set. 2470
Respiratory.Rate..spontaneous. 2300
Respiratory.Rate..Total. 2286
Respiratory.Rate 2
Chloride..whole.blood. 3569
Chloride..Whole.Blood 2267
Sodium..whole.blood 3362
Sodium..Whole.Blood 2121
Potassium..whole.blood. 2802
Potassium..Whole.Blood.2 1441
Hemoglobin 2357
Hemoglobin.2 0
After combining variables
After_Missing_Count
AvgBloodPressureSystolic 2
AvgBloodPressureDiastolic 2
ConsolidatedRespiratoryRate 0
AvgChloride 2265
AvgSodium 2119
AvgPotassium 1441
AvgHemoglobin 0

For handling missing values in the mimic_iv_clean data set for variables related to lab tests and vital signs, we are doing the statistical test before imputation:

Missingness by group

Impute missing values based on the similar cases. For that we are gonna calculate the missing values for each variable by age, gender and ethnicity. And going to use only the variables with highest percentage of missing values.

Why did we consider Age, Gender, Ethnicity:

  1. Age: Age can be an important factor in healthcare data analysis as different age groups may have different patterns of missingness. For example, certain medical tests or measurements may be more common or relevant in specific age groups, leading to different rates of missing data.

  2. Gender: Gender can also play a role in healthcare and medical data. Some conditions or tests may be more prevalent or important for one gender than the other, leading to differences in missing data patterns.

  3. Ethnicity: Ethnicity can be associated with various health factors and medical conditions, which could influence the presence or absence of certain variables in the data set.

Missingness by Age.Group

## 
## Summary table for Temperature 
## # A tibble: 4 × 4
##   Age.Group min_missing max_missing mean_missing
##   <fct>           <dbl>       <dbl>        <dbl>
## 1 19-35           0.535       0.535        0.535
## 2 36-50           0.603       0.603        0.603
## 3 51-65           0.584       0.584        0.584
## 4 66-100          0.612       0.612        0.612
## [1] "Chi-square test result for Temperature"
## 
##  Pearson's Chi-squared test
## 
## data:  chi_sq_variable
## X-squared = 5.9139, df = 3, p-value = 0.1159

## 
## Summary table for Glucose..whole.blood. 
## # A tibble: 4 × 4
##   Age.Group min_missing max_missing mean_missing
##   <fct>           <dbl>       <dbl>        <dbl>
## 1 19-35           0.545       0.545        0.545
## 2 36-50           0.393       0.393        0.393
## 3 51-65           0.372       0.372        0.372
## 4 66-100          0.510       0.510        0.510
## [1] "Chi-square test result for Glucose..whole.blood."
## 
##  Pearson's Chi-squared test
## 
## data:  chi_sq_variable
## X-squared = 106.31, df = 3, p-value < 2.2e-16

## 
## Summary table for AvgChloride 
## # A tibble: 4 × 4
##   Age.Group min_missing max_missing mean_missing
##   <fct>           <dbl>       <dbl>        <dbl>
## 1 19-35           0.434       0.434        0.434
## 2 36-50           0.285       0.285        0.285
## 3 51-65           0.279       0.279        0.279
## 4 66-100          0.393       0.393        0.393
## [1] "Chi-square test result for AvgChloride"
## 
##  Pearson's Chi-squared test
## 
## data:  chi_sq_variable
## X-squared = 82.618, df = 3, p-value < 2.2e-16

## 
## Summary table for AvgSodium 
## # A tibble: 4 × 4
##   Age.Group min_missing max_missing mean_missing
##   <fct>           <dbl>       <dbl>        <dbl>
## 1 19-35           0.404       0.404        0.404
## 2 36-50           0.266       0.266        0.266
## 3 51-65           0.260       0.260        0.260
## 4 66-100          0.368       0.368        0.368
## [1] "Chi-square test result for AvgSodium"
## 
##  Pearson's Chi-squared test
## 
## data:  chi_sq_variable
## X-squared = 75.938, df = 3, p-value = 2.281e-16

## 
## Summary table for AvgPotassium 
## # A tibble: 4 × 4
##   Age.Group min_missing max_missing mean_missing
##   <fct>           <dbl>       <dbl>        <dbl>
## 1 19-35           0.273       0.273        0.273
## 2 36-50           0.171       0.171        0.171
## 3 51-65           0.175       0.175        0.175
## 4 66-100          0.253       0.253        0.253
## [1] "Chi-square test result for AvgPotassium"
## 
##  Pearson's Chi-squared test
## 
## data:  chi_sq_variable
## X-squared = 51.396, df = 3, p-value = 4.029e-11

Missingness by Gender

## 
## Summary table for Temperature 
## # A tibble: 2 × 4
##   gender min_missing max_missing mean_missing
##   <fct>        <dbl>       <dbl>        <dbl>
## 1 F            0.591       0.591        0.591
## 2 M            0.609       0.609        0.609
## [1] "Chi-square test result for Temperature"
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  chi_sq_variable
## X-squared = 2.0776, df = 1, p-value = 0.1495

## 
## Summary table for Glucose..whole.blood. 
## # A tibble: 2 × 4
##   gender min_missing max_missing mean_missing
##   <fct>        <dbl>       <dbl>        <dbl>
## 1 F            0.553       0.553        0.553
## 2 M            0.411       0.411        0.411
## [1] "Chi-square test result for Glucose..whole.blood."
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  chi_sq_variable
## X-squared = 120.07, df = 1, p-value < 2.2e-16

## 
## Summary table for AvgChloride 
## # A tibble: 2 × 4
##   gender min_missing max_missing mean_missing
##   <fct>        <dbl>       <dbl>        <dbl>
## 1 F            0.430       0.430        0.430
## 2 M            0.310       0.310        0.310
## [1] "Chi-square test result for AvgChloride"
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  chi_sq_variable
## X-squared = 94.036, df = 1, p-value < 2.2e-16

## 
## Summary table for AvgSodium 
## # A tibble: 2 × 4
##   gender min_missing max_missing mean_missing
##   <fct>        <dbl>       <dbl>        <dbl>
## 1 F            0.403       0.403        0.403
## 2 M            0.290       0.290        0.290
## [1] "Chi-square test result for AvgSodium"
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  chi_sq_variable
## X-squared = 85.388, df = 1, p-value < 2.2e-16

## 
## Summary table for AvgPotassium 
## # A tibble: 2 × 4
##   gender min_missing max_missing mean_missing
##   <fct>        <dbl>       <dbl>        <dbl>
## 1 F            0.283       0.283        0.283
## 2 M            0.192       0.192        0.192
## [1] "Chi-square test result for AvgPotassium"
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  chi_sq_variable
## X-squared = 71.397, df = 1, p-value < 2.2e-16

Missingness by Ethnicity

## 
## Summary table for Temperature 
## # A tibble: 33 × 4
##    ethnicity                     min_missing max_missing mean_missing
##    <fct>                               <dbl>       <dbl>        <dbl>
##  1 AMERICAN INDIAN/ALASKA NATIVE       0.5         0.5          0.5  
##  2 ASIAN                               0.627       0.627        0.627
##  3 ASIAN - ASIAN INDIAN                0.824       0.824        0.824
##  4 ASIAN - CHINESE                     0.531       0.531        0.531
##  5 ASIAN - KOREAN                      0.333       0.333        0.333
##  6 ASIAN - SOUTH EAST ASIAN            0.438       0.438        0.438
##  7 BLACK/AFRICAN                       0.647       0.647        0.647
##  8 BLACK/AFRICAN AMERICAN              0.474       0.474        0.474
##  9 BLACK/CAPE VERDEAN                  0.545       0.545        0.545
## 10 BLACK/CARIBBEAN ISLAND              0.533       0.533        0.533
## # ℹ 23 more rows
## Warning in chisq.test(chi_sq_variable): Chi-squared approximation may be
## incorrect
## [1] "Chi-square test result for Temperature"
## 
##  Pearson's Chi-squared test
## 
## data:  chi_sq_variable
## X-squared = 85.386, df = 32, p-value = 9.503e-07
## 
## Summary table for Glucose..whole.blood. 
## # A tibble: 33 × 4
##    ethnicity                     min_missing max_missing mean_missing
##    <fct>                               <dbl>       <dbl>        <dbl>
##  1 AMERICAN INDIAN/ALASKA NATIVE       0.8         0.8          0.8  
##  2 ASIAN                               0.608       0.608        0.608
##  3 ASIAN - ASIAN INDIAN                0.471       0.471        0.471
##  4 ASIAN - CHINESE                     0.562       0.562        0.562
##  5 ASIAN - KOREAN                      0.667       0.667        0.667
##  6 ASIAN - SOUTH EAST ASIAN            0.438       0.438        0.438
##  7 BLACK/AFRICAN                       0.824       0.824        0.824
##  8 BLACK/AFRICAN AMERICAN              0.549       0.549        0.549
##  9 BLACK/CAPE VERDEAN                  0.591       0.591        0.591
## 10 BLACK/CARIBBEAN ISLAND              0.6         0.6          0.6  
## # ℹ 23 more rows
## Warning in chisq.test(chi_sq_variable): Chi-squared approximation may be
## incorrect

## [1] "Chi-square test result for Glucose..whole.blood."
## 
##  Pearson's Chi-squared test
## 
## data:  chi_sq_variable
## X-squared = 73.816, df = 32, p-value = 3.768e-05
## 
## Summary table for AvgChloride 
## # A tibble: 33 × 4
##    ethnicity                     min_missing max_missing mean_missing
##    <fct>                               <dbl>       <dbl>        <dbl>
##  1 AMERICAN INDIAN/ALASKA NATIVE       0.8         0.8          0.8  
##  2 ASIAN                               0.510       0.510        0.510
##  3 ASIAN - ASIAN INDIAN                0.471       0.471        0.471
##  4 ASIAN - CHINESE                     0.375       0.375        0.375
##  5 ASIAN - KOREAN                      0.333       0.333        0.333
##  6 ASIAN - SOUTH EAST ASIAN            0.25        0.25         0.25 
##  7 BLACK/AFRICAN                       0.471       0.471        0.471
##  8 BLACK/AFRICAN AMERICAN              0.329       0.329        0.329
##  9 BLACK/CAPE VERDEAN                  0.5         0.5          0.5  
## 10 BLACK/CARIBBEAN ISLAND              0.467       0.467        0.467
## # ℹ 23 more rows
## Warning in chisq.test(chi_sq_variable): Chi-squared approximation may be
## incorrect

## [1] "Chi-square test result for AvgChloride"
## 
##  Pearson's Chi-squared test
## 
## data:  chi_sq_variable
## X-squared = 74.873, df = 32, p-value = 2.726e-05
## 
## Summary table for AvgSodium 
## # A tibble: 33 × 4
##    ethnicity                     min_missing max_missing mean_missing
##    <fct>                               <dbl>       <dbl>        <dbl>
##  1 AMERICAN INDIAN/ALASKA NATIVE       0.8         0.8          0.8  
##  2 ASIAN                               0.451       0.451        0.451
##  3 ASIAN - ASIAN INDIAN                0.412       0.412        0.412
##  4 ASIAN - CHINESE                     0.328       0.328        0.328
##  5 ASIAN - KOREAN                      0.333       0.333        0.333
##  6 ASIAN - SOUTH EAST ASIAN            0.312       0.312        0.312
##  7 BLACK/AFRICAN                       0.529       0.529        0.529
##  8 BLACK/AFRICAN AMERICAN              0.290       0.290        0.290
##  9 BLACK/CAPE VERDEAN                  0.5         0.5          0.5  
## 10 BLACK/CARIBBEAN ISLAND              0.433       0.433        0.433
## # ℹ 23 more rows
## Warning in chisq.test(chi_sq_variable): Chi-squared approximation may be
## incorrect

## [1] "Chi-square test result for AvgSodium"
## 
##  Pearson's Chi-squared test
## 
## data:  chi_sq_variable
## X-squared = 69.65, df = 32, p-value = 0.0001313
## 
## Summary table for AvgPotassium 
## # A tibble: 33 × 4
##    ethnicity                     min_missing max_missing mean_missing
##    <fct>                               <dbl>       <dbl>        <dbl>
##  1 AMERICAN INDIAN/ALASKA NATIVE       0.6         0.6          0.6  
##  2 ASIAN                               0.333       0.333        0.333
##  3 ASIAN - ASIAN INDIAN                0.353       0.353        0.353
##  4 ASIAN - CHINESE                     0.156       0.156        0.156
##  5 ASIAN - KOREAN                      0           0            0    
##  6 ASIAN - SOUTH EAST ASIAN            0.25        0.25         0.25 
##  7 BLACK/AFRICAN                       0.294       0.294        0.294
##  8 BLACK/AFRICAN AMERICAN              0.143       0.143        0.143
##  9 BLACK/CAPE VERDEAN                  0.273       0.273        0.273
## 10 BLACK/CARIBBEAN ISLAND              0.233       0.233        0.233
## # ℹ 23 more rows
## Warning in chisq.test(chi_sq_variable): Chi-squared approximation may be
## incorrect

## [1] "Chi-square test result for AvgPotassium"
## 
##  Pearson's Chi-squared test
## 
## data:  chi_sq_variable
## X-squared = 91.469, df = 32, p-value = 1.234e-07

Missingness by Mortality

## 
## Summary table for Temperature 
## # A tibble: 2 × 4
##   mortality min_missing max_missing mean_missing
##   <fct>           <dbl>       <dbl>        <dbl>
## 1 Alive           0.658       0.658        0.658
## 2 Death           0.291       0.291        0.291
## [1] "Chi-square test result for Temperature"
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  chi_sq_variable
## X-squared = 460.22, df = 1, p-value < 2.2e-16

## 
## Summary table for Glucose..whole.blood. 
## # A tibble: 2 × 4
##   mortality min_missing max_missing mean_missing
##   <fct>           <dbl>       <dbl>        <dbl>
## 1 Alive           0.460       0.460        0.460
## 2 Death           0.489       0.489        0.489
## [1] "Chi-square test result for Glucose..whole.blood."
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  chi_sq_variable
## X-squared = 2.7531, df = 1, p-value = 0.09707

## 
## Summary table for AvgChloride 
## # A tibble: 2 × 4
##   mortality min_missing max_missing mean_missing
##   <fct>           <dbl>       <dbl>        <dbl>
## 1 Alive           0.345       0.345        0.345
## 2 Death           0.412       0.412        0.412
## [1] "Chi-square test result for AvgChloride"
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  chi_sq_variable
## X-squared = 15.682, df = 1, p-value = 7.492e-05

## 
## Summary table for AvgSodium 
## # A tibble: 2 × 4
##   mortality min_missing max_missing mean_missing
##   <fct>           <dbl>       <dbl>        <dbl>
## 1 Alive           0.325       0.325        0.325
## 2 Death           0.373       0.373        0.373
## [1] "Chi-square test result for AvgSodium"
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  chi_sq_variable
## X-squared = 8.1352, df = 1, p-value = 0.004341

## 
## Summary table for AvgPotassium 
## # A tibble: 2 × 4
##   mortality min_missing max_missing mean_missing
##   <fct>           <dbl>       <dbl>        <dbl>
## 1 Alive           0.232       0.232        0.232
## 2 Death           0.196       0.196        0.196
## [1] "Chi-square test result for AvgPotassium"
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  chi_sq_variable
## X-squared = 5.6942, df = 1, p-value = 0.01702

Based on the summary tables, we can conclude that there is no strong association between the demographic factors Age.Group, Gender, Ethnicity, Mortality and the missingness of the variables. However, there are some ethnicity types that show a strong association with missingness:

  • For the ethnicity group HISPANIC/LATINO - CENTRAL AMERICAN, all values are missing.
  • For the ethnicity group HISPANIC/LATINO - MEXICAN, Temperature is missing, and half of the other variables are missing.
  • The mortality and missingness of Temperature variable show a strong association.

Imputing missing values of vital signs

Due to a high number of missing values and its lack of significance in the reference document, the Temperature variable has been removed from the data set.

Imputing missing values of lab tests

Even though ‘Glucose..whole.blood.’ shows a moderate significance between death and being alive, the reference document considers it as the most important factor. Therefore, we are going to keep this feature but remove the observations that are missing ‘Glucose..whole.blood.’.

Since the anion gap is calculated using the concentrations of sodium (Na), chloride (Cl), and bicarbonate (HCO3) in mmol/L, we will retain only the anion gap feature and drop the sodium and chloride features. Bicarbonate is considered important in the reference document, so we will keep it after imputing missing values. Additionally, other lab tests with a low percentage of missing values—INR, Prothrombin.time, Anion.gap, and Creatinine..serum will have missing values imputed with the median. However, Potassium, which has a high percentage of missing values and is not considered an important feature, will be dropped.

Check for missing values

Missing Value Summary
x
subject_id 0
gender 0
age 0
mortality 0
ethnicity 0
Heart.Rate 4
Heart.rate.Alarm…High 0
Heart.Rate.Alarm…Low 0
SpO2.Desat.Limit 0
INR 0
Prothrombin.time 0
Anion.gap 0
Creatinine..serum. 0
Bicarbonate 0
Glucose..whole.blood. 0
GCS…Eye.Opening 0
Hematocrit 0
Platelet.Count 7
Age.Group 0
Myocardial 0
Rupture 0
Thrombosis 0
Systolic 0
Diastolic 0
Comb_DS 0
Fibrillation 0
PulmonaryDisease 0
Stroke 0
Hyperlipidemia 0
Dementia 0
AvgBloodPressureSystolic 0
AvgBloodPressureDiastolic 0
ConsolidatedRespiratoryRate 0
AvgHemoglobin 0

Check for missing values

Missing Value Summary
x
subject_id 0
gender 0
age 0
mortality 0
ethnicity 0
Heart.Rate 0
Heart.rate.Alarm…High 0
Heart.Rate.Alarm…Low 0
SpO2.Desat.Limit 0
INR 0
Prothrombin.time 0
Anion.gap 0
Creatinine..serum. 0
Bicarbonate 0
Glucose..whole.blood. 0
GCS…Eye.Opening 0
Hematocrit 0
Platelet.Count 0
Age.Group 0
Myocardial 0
Rupture 0
Thrombosis 0
Systolic 0
Diastolic 0
Comb_DS 0
Fibrillation 0
PulmonaryDisease 0
Stroke 0
Hyperlipidemia 0
Dementia 0
AvgBloodPressureSystolic 0
AvgBloodPressureDiastolic 0
ConsolidatedRespiratoryRate 0
AvgHemoglobin 0